Data Visualisation Portfolio

Author

Student ID: 06119158

Published

January 3, 2026

Introduction

[cite_start]This e-portfolio documents my work for the SQ4012 Data Visualisation module[cite: 89]. It demonstrates my ability to plan, execute, and refine data visualisations using R and the Grammar of Graphics. [cite_start]The portfolio is structured into four key tasks illustrating the journey from theoretical design to dynamic animation[cite: 116].


Task 1: Planning a Graphic (The SPEC)

The Specification

[cite_start]Below is a structured plan for the graphic implemented in Task 3. This specification acts as a blueprint, ensuring the visualization is well-defined before any code is written[cite: 119].

1. Data

  • Source: The diamonds dataset (built-in to ggplot2).
  • Variables:
    • carat (Quantitative): The weight of the diamond.
    • price (Quantitative): The price in US dollars.
    • cut (Ordinal): Quality of the cut (Fair, Good, Very Good, Premium, Ideal).
    • clarity (Ordinal): A measurement of how clear the diamond is.

2. Aesthetics

  • X-Axis: Mapped to carat to show the independent variable (weight).
  • Y-Axis: Mapped to price to show the dependent variable (cost).
  • Color: Mapped to cut. [cite_start]This uses a discrete color scale to visually distinguish between quality tiers within the scatter plot[cite: 124].
  • Alpha (Transparency): Set to 0.6 to mitigate overplotting, as the dataset is large.

3. Geometries

  • [cite_start]geom_point: Used to visualize the individual data points, allowing us to see the distribution and density of diamonds[cite: 124].
  • geom_smooth: A Loess smoothing line is added to summarize the trend between weight and price.

4. Scales

  • [cite_start]Color Scale: A qualitative palette (Brewer ‘Dark2’) is chosen to ensure distinctness between the cut categories without implying a false numeric relationship[cite: 124].
  • Y-Axis Scale: A continuous linear scale formatted with dollar signs ($) for readability.

5. Coordinates

  • System: A Cartesian coordinate system (standard x/y plotting). [cite_start]This is chosen because it allows for precise comparison of position along the axes[cite: 124].

6. Guides

  • [cite_start]Legend: A legend for ‘Cut’ will be placed on the right to assist in decoding the color mapping[cite: 124].
  • Faceting: The plot will be faceted by clarity. [cite_start]This creates “small multiples” to allow comparison across different clarity grades without overcrowding a single plot[cite: 124].

Reflection on Reproducibility

Writing a specification (SPEC) like the one above is a critical step in reproducible data science. [cite_start]By separating the design (the logical mapping of data to visuals) from the execution (the actual R code), we ensure that the visualization plan is robust[cite: 120].

If I were required to switch tools—for example, from R (ggplot2) to Python (matplotlib) or Tableau—this SPEC would remain valid because it describes the grammar of the graphic rather than the syntax of the tool. Furthermore, explicitly planning scales and coordinates beforehand forces the designer to catch potential issues, such as overplotting or colorblind-unfriendly palettes, before wasting time on coding errors.


Task 2: Grammar of Graphics

Introduction

[cite_start]The “Grammar of Graphics,” introduced by Leland Wilkinson and popularized by Hadley Wickham, provides a consistent framework for describing all statistical graphics[cite: 123]. Rather than thinking of charts as a fixed “menagerie” (e.g., a “bar chart” vs. a “pie chart”), the Grammar of Graphics allows us to build charts component by component. This essay defines the key elements of this grammar.

Core Components

1. Aesthetics

Aesthetics describe how data is perceived by the human eye. [cite_start]In the grammar, we “map” data variables to aesthetic attributes such as x-position, y-position, colour, size, and shape[cite: 124]. For example, in a scatter plot, we might map engine size to the x-axis (position) and fuel efficiency to the y-axis (position). Aesthetics are the bridge between raw numbers and visual properties.

2. Scales and Transformations

[cite_start]Scales control the mapping from data space to aesthetic space[cite: 124]. If a variable ranges from 0 to 1000, the scale determines how that range fits onto a computer screen (e.g., 0 pixels to 500 pixels). Scales also handle transformations, such as converting linear data into a logarithmic scale to better visualize exponential growth, or mapping categories (like “Male/Female”) to specific colors (like “Blue/Red”).

3. Geometries

[cite_start]Geometries (or “geoms”) represent the actual geometric objects drawn on the plot to represent data[cite: 124]. Common geometries include points (for scatter plots), bars (for bar charts), and lines (for time series). A single plot can contain multiple geometries; for instance, a chart might layer a geom_point (showing raw data) underneath a geom_smooth (showing a statistical trend line).

4. Coordinates

[cite_start]The coordinate system defines the physical space in which the data is drawn[cite: 124]. The most common system is Cartesian (defined by x and y axes at right angles). However, the grammar allows for alternative systems like Polar coordinates. A pie chart, for example, is mathematically identical to a stacked bar chart, but plotted in polar coordinates rather than Cartesian ones.

5. Guides and Facets

Guides are the tools that allow the viewer to “read” the plot back into data. [cite_start]These include axes (which decode position) and legends (which decode color, size, or shape)[cite: 124]. Faceting is the process of splitting a dataset into subsets and creating a matrix of small, similar plots for each subset. This is powerful for comparing categorical variables without the visual clutter of overlapping points.

Conclusion

By understanding these components, we gain the ability to construct arbitrarily complex graphics. [cite_start]We are not limited to pre-set Excel templates; we can mix and match geometries, switch coordinate systems, and adjust scales to reveal the exact story hidden within the data[cite: 124].


Task 3: Complex Graphic in Practice

1. Data Preparation

[cite_start]For this task, I am using the diamonds dataset to create a complex visualization that employs multiple aesthetics and faceting[cite: 126].

Code
library(ggplot2)
library(dplyr)

# Subsetting for clearer visualization (random 1000 rows to prevent overplotting)
set.seed(123)
diamonds_subset <- diamonds %>% sample_n(1000)

2. The Complex Graphic (Cartesian)

This plot embodies the Grammar of Graphics by layering points and trend lines, mapping color to a categorical variable (cut), and faceting by another (clarity).

Code
ggplot(diamonds_subset, aes(x = carat, y = price)) +
  # Geometry 1: Points with color mapped to Cut
  geom_point(aes(color = cut), alpha = 0.6, size = 2) +
  
  # Geometry 2: Smooth trend line
  geom_smooth(method = "loess", color = "black", se = FALSE, linewidth = 0.5) +
  
  # Faceting: Split the chart by Clarity
  facet_wrap(~clarity, nrow = 2) +
  
  # Scales
  scale_color_brewer(palette = "Dark2") +
  scale_y_continuous(labels = scales::dollar_format()) +
  
  # Guides
  labs(
    title = "Complex Relationship: Price vs Carat",
    subtitle = "Faceted by Clarity (Subset of n=1000)",
    x = "Carat (Weight)",
    y = "Price (USD)",
    color = "Cut Quality"
  ) +
  theme_minimal()

Diamond Price vs Carat: Faceted by Clarity

3. Variation: Polar Coordinates

To demonstrate the flexibility of the grammar, I have transformed a summary of the same dataset into Polar Coordinates. This transforms a standard stacked bar chart into a “Coxcomb” or radial chart.

Code
ggplot(diamonds, aes(x = cut, fill = clarity)) +
  geom_bar(position = "fill") + 
  coord_polar(theta = "y") +
  labs(
    title = "Proportion of Clarity within Cuts",
    subtitle = "Polar Coordinate Variation",
    x = "",
    y = ""
  ) +
  theme_void() +
  scale_fill_viridis_d()

Diamond Count by Cut and Clarity (Polar)

4. Discussion on Perception

The difference in perception between these two variants is significant.

The Cartesian scatter plot (Figure 1) allows for precise analytical comparison. We can easily see that as Carat increases, Price increases, and we can compare the slope of this increase across different Clarity facets. The axes provide a clear reference frame.

In contrast, the Polar coordinate chart (Figure 2) is aesthetically pleasing and highlights the cyclic or “whole” nature of the data, but it distorts perception. Humans are generally better at comparing linear lengths (bars) than angles or arc lengths (polar segments). While the Polar chart effectively shows that ‘Ideal’ cuts have a diverse distribution of clarity, it makes it difficult to read exact proportions compared to a standard stacked bar chart.


Task 4: Animation

1. Setup

I am using the {gganimate} package to visualize the gapminder dataset over time. As per the assignment requirement, I have added a “twist” by focusing on a specific global event: The 1994 Genocide in Rwanda.

Code
library(gganimate)
library(gapminder)

# Create a column to highlight Rwanda specifically
gapminder_twist <- gapminder %>%
  mutate(highlight = ifelse(country == "Rwanda", "Rwanda", "Other")) %>%
  arrange(highlight) # Ensure Rwanda plots on top

2. Animated Plot

The animation below tracks Life Expectancy vs GDP. A static chart might simply show Rwanda as an outlier, but the animation reveals when and how fast the tragedy occurred.

Code
p <- ggplot(gapminder_twist, aes(gdpPercap, lifeExp, size = pop, color = highlight)) +
  geom_point(alpha = 0.7) +
  
  # Highlighting Rwanda in Red, others in neutral Grey
  scale_color_manual(values = c("Other" = "grey80", "Rwanda" = "red")) +
  scale_size(range = c(2, 12), guide = "none") +
  scale_x_log10() +
  
  labs(title = 'Year: {frame_time}', 
       subtitle = 'Highlighting the impact of the 1994 crisis in Rwanda (Red)',
       x = 'GDP per capita', 
       y = 'Life Expectancy') +
  theme_minimal() +
  
  # Animation transitions
  transition_time(year) +
  ease_aes('linear')

# Render the animation
animate(p, renderer = gifski_renderer(), duration = 15, fps = 20, end_pause = 10)

Global Development: Highlighting the 1994 Drop in Rwanda

3. Reflection

Using animation adds a dimension of “storytelling” that static graphics lack. In this visualization, the “twist” is the sudden, plummeting drop of the red dot (Rwanda) in the early 1990s.

In a static plot, this data point might just appear as a low outlier mixed in with other data. However, the animation forces the audience to witness the change. We see the country progressing normally, and then suddenly crashing, before recovering. This movement elicits a stronger emotional and cognitive response, making the pattern of the event (the 1994 genocide) undeniable and visually distinct from the general global trend of increasing health and wealth.